5 research outputs found

    An enhanced ant colony system algorithm for dynamic fault tolerance in grid computing

    Get PDF
    Fault tolerance in grid computing allows the system to continue operate despite occurrence of failure. Most fault tolerance algorithms focus on fault handling techniques such as task reprocessing, checkpointing, task replication, penalty, and task migration. Ant colony system (ACS), a variant of ant colony optimization (ACO), is one of the promising algorithms for fault tolerance due to its ability to adapt to both static and dynamic combinatorial optimization problems. However, ACS algorithm does not consider the resource fitness during task scheduling which leads to poor load balancing and lower execution success rate. This research proposes dynamic ACS fault tolerance with suspension (DAFTS) in grid computing that focuses on providing effective fault tolerance techniques to improve the execution success rate and load balancing. The proposed algorithm consists of dynamic evaporation rate, resource fitness-based scheduling process, enhanced pheromone update with trust factor and suspension, and checkpoint-based task reprocessing. The research framework consists of four phases which are identifying fault tolerance techniques, enhancing resource assignment and job scheduling, improving fault tolerance algorithm and, evaluating the performance of the proposed algorithm. The proposed algorithm was developed in a simulated grid environment called GridSim and evaluated against other fault tolerance algorithms such as trust-based ACO, fault tolerance ACO, ACO without fault tolerance and ACO with fault tolerance in terms of total execution time, average latency, average makespan, throughput, execution success rate and load balancing. Experimental results showed that the proposed algorithm achieved the best performance in most aspects, and second best in terms of load balancing. The DAFTS achieved the smallest increase on execution time, average makespan and average latency by 7%, 11% and 5% respectively, and smallest decrease on throughput and execution success rate by 6.49% and 9% respectively as the failure rate increases. The DAFTS also achieved the smallest increment on execution time, average makespan and average latency by 5.8, 8.5 and 8.7 times respectively, and highest increase on throughput and highest execution success rate by 72.9% and 93.7% respectively as the number of jobs increases. The proposed algorithm can effectively overcome load balancing problems and increase execution success rates in distributed systems that are prone to faults

    Load Balancing Using Dynamic Ant Colony System Based Fault Tolerance in Grid Computing

    Get PDF
    Load balancing is often disregarded when implementing fault tolerance capability in grid computing. Effective load balancing ensures that a fair amount of load is assigned to each resource, based on its fitness rather than assigning a majority of tasks to the most fitting resources. Proper load balancing in a fault tolerance system would also reduce the bottleneck at the most fit resources and increase utilization of other resources. This paper presents a fault tolerance algorithm based on ant colony system, that considers load balancing based on resource fitness with resubmission and checkpoint technique, to improve fault tolerance capability in grid computing. Experimental results indicated that the proposed fault tolerance algorithm has better execution time, throughput, makespan, latency, load balancing and success rate

    Load balancing using dynamic ant colony system based fault tolerance in grid computing

    Get PDF
    Load balancing is often disregarded when implementing fault tolerance capability in grid computing. Effective load balancing ensures that a fair amount of load is assigned to each resource, based on its fitness rather than assigning a majority of tasks to the most fitting resources. Proper load balancing in a fault tolerance system would also reduce the bottleneck at the most fit resources and increase utilization of other resources. This paper presents a fault tolerance algorithm based on ant colony system, that considers load balancing based on resource fitness with re submission and checkpoint technique, to improve fault tolerance capability in grid computing. Experimental results indicated that the proposed fault tolerance algorithm has better execution time, throughput, make span, latency, load balancing and success rate

    Fault tolerance grid scheduling with checkpoint based on ant colony system

    Get PDF
    Task resubmission and checkpoint are among several popular techniques used in providing fault tolerance in grid computing. However, due to the lack of side-by-side comparison, it is not certain of the best technique that would not degrade the system performance in addition to providing fault tolerance capability. This study proposed Dynamic ACS-based Fault Tolerance in grid computing using resubmission to new resource, checkpoint technique and utilization of resource execution history with the aim to reduce execution and task processing time and to increase the success rate in grid environment. The proposed algorithm is compared with other relevant algorithms to measure the performance in terms of execution time, success rate and average processing time. The results suggest that the proposed algorithm with improved task resubmission, checkpoint and extended pheromone update formula gives better performance in managing execution failure as well as resource selection during task assignment or resubmission

    Dynamic ACO-based fault tolerance in grid computing

    No full text
    Scheduling jobs in distributed conditions of grid computing is nearly impossible to have a completely fault-free system. It is important to integrate fault tolerance capability in the system so that the system can continue to run even in the presence of failure in addition to improving the scheduling process as well as reducing the possibility of faults. Typically, load balancing is not considered in the presence of failure and this may lead to an inefficient scheduling process despite having a good fault tolerance strategy. This paper presents an ant-based fault tolerance algorithm that used checkpoint and resubmission techniques with consideration of execution history in the pheromone updating process to enhance fault tolerance capability. Experimental results showed that the proposed algorithm has better performance as compared to other relevant algorithms in terms of execution time, success rate, and average turnaround time per job
    corecore